South Korea during COVID-19
One of the world’s most densely populated countries
51.64 million inhabitants
First case of COVID-19 confirmed on the 20th of January 2020
259 deaths caused by COVID-19
13 May, 2020
One of the world’s most densely populated countries
51.64 million inhabitants
First case of COVID-19 confirmed on the 20th of January 2020
259 deaths caused by COVID-19
How is human behaviour driving the spread of the disease?
How has the epidemic evolved in South Korea?
Is there any correlation between the place of infection and severity of the disease?
Does any gender or age predispose for getting the disease or for a more severe outcome?
Can characteristics city features be used to predict the burden of disease?
Korean COVID-19 dataset Last downloaded 07-04-2020
COVID-19 dataset from Kaggle
Project core:
Extension of ggplot:
Rendering transmission networks:
Rendering maps:
Shiny app (+ maps and geo-location packages)
Remove non valid data (NA’s)
Remove non necessary columns.
Converting data into the tidy format:
Each variable has a column
Each observation has its own row
Each value has its own cell
Joining dataset tables using full_join
Subsetting data
Combining columns using unite
Creating new variables for the analysis
Case data ( Case )
Patient data (Patient info + Patient route)
Time data (Time + Time age + Time gender + Time province + SearchTrend)
City data (region + Patient info)
| score_org | score_pca |
|---|---|
| 42.5% | 49.6% |
| accuracy |
|---|
| 46.4% |
Confirmed cases are higher than deaths.
There’s no correlation between the place of infection and severity of the disease.
Men die but more women are confirmed to be sick.
Young people are driving the spread.
People in their 70s and 80s have a higher fatality rage. There are clusters of superspreaders of certain age range.
Accuracy is just above 50 % - better than random with 4 classes (similar performance as kmeans).
*** ### PCA Variance explained